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Abstract. We consider the evolution of an asexually reproducing population in an 

uncorrelated random fitness landscape in the limit of infinite genome size, which 

pL^ ' implies that each mutation generates a new fitness value drawn from a probability 

Ph ■ distribution g{w). This is the finite population version of Kingman's house of cards 

O ! model [J.F.C. Kingman, J. Appl. Probab. 15, 1 (1978)]. In contrast to Kingman's 

(^ ' work, the focus here is on unbounded distributions g{w) which lead to an indefinite 

growth of the population fitness. The model is solved analytically in the limit of infinite 
population size N —^ oo and simulated numerically for finite N. When the genome- 
wide mutation probability U is small, the long time behavior of the model reduces to a 
point process of fixation events, which is referred to as a diluted record process (DRP). 
Q>^ ' The DRP is similar to the standard record process except that a new record candidate 

OO . (a number that exceeds all previous entries in the sequence) is accepted only with a 

certain probability that depends on the values of the current record and the candidate. 
We develop a systematic analytic approximation scheme for the DRP. At finite U the 
fitness frequency distribution of the population decomposes into a stationary part due 
r~^ ' to mutations and a traveling wave component due to selection, which is shown to imply 

^D ■ a reduction of the mean fitness by a factor of 1 — C/ compared to the U ^ limit. 

> 
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1. Introduction 

A fruitful excliange of concepts and metliods lias taken place between evolutionary 
population biology and the statistical physics of disordered systems over the past several 
decades [H El Ej. On the most basic level, one imagines that a biological population 
evolves by searching a high-dimensional landscape for fitness peaks, in much the same 
way as a disordered system relaxes towards its low energy configurations [H [5]. Not 
surprisingly, extremal statistics arguments play a prominent role in both contexts 
El O [HI, [9l [ini [11]. To make the analogy more precise, we note that the inheritable 
characters of an individual (its genotype) are encoded in a genetic sequence (consisting 
of nucleotide letters or the alleles of genes), which for many purposes can be reduced 
to a binary sequence a = (o"i, ...,0"^) of fixed length L. For a statistical physicist it 
is very natural to assign the values cxj = ±1 to the letters, and to treat the sequence 
as, e.g., a row of spins in the two-dimensional Ising model |12] or a configuration of a 
quantum spin chain |[13j. A fitness landscape is then a real- valued function W{a) on the 
L-dimensional sequence space, analogous to the (negative) energy of the spin system. 

The notion of a fitness landscape is a venerable and persistent image in evolutionary 
biology [m [15] , but it has always been plagued by a certain elusiveness, in the sense 
that very little is known about the fitness landscapes in which real organisms evolve. 
This situation may eventually change, as the experimental mapping of genotypic fitness 
becomes feasible for simple microbial systems [161 [13 • Meanwhile it is reasonable 
to handle our ignorance of real fitness landscapes by treating W{a) as a realization 
of a suitably chosen ensemble of random functions. This approach was pioneered 
by KaufFman and coworkers [71 [8], who introduced the NK family of random fitness 
landscapes which are closely analogous to Derrida's p-spin model of spin glasses 
developed a few years earlier [6l[T8]. Two limiting cases of the model are of interest here: 
The random energy model (REM), in which fitness (or energy) is assigned randomly 
without correlations to the genotypes (or spin configurations), and the case in which the 
letters cij contribute independently {multiplicatively for discrete time dynamics [H]) to 
the fitness. In the latter case there is always a single fitness maximum, which explains 
why this is also referred to as the Fujiyama landscape. In the evolutionary context 
deviations from multiplicative fitness are associated with epistasis [16] . Within the NK 
family, the REM landscape is maximally epistatic [T9] . 

In previous work the evolutionary process in the REM landscape has been studied 
mostly in the limit of infinite population size, where fluctuations due to sampling noise 
(also known as genetic drift in population genetics) are ignored [15]. While this allows 
one to derive a rather complete picture of both stationary [201 [21] and time-dependent 
[221 [231 l2ll [25] properties of the model, the assumption of an infinite population 
is unrealistic, because the number of possible genotypes 2^ exceeds any conceivable 
population size N already for moderate values of L. On the other hand, individual- 
based simulation studies which explicitly follow the population through sequence space 
are restricted to rather short sequences [261 
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In this contribution we therefore propose to perform the hmit of infinite sequence 
length at finite population size N. This kind of limit is well known in population 
genetics, where it is viewed alternatively as a limit on the number of genetic loci or sites 
(the sequence length L in our setting) or as a limit on the number of alleles (the number 
of possible values that the variables a, can take). Although mathematically distinct, 
the two variants are equivalent for our purposes. 

The implementation of the infinite sites limit for the REM landscape is 
straightforward: For L ^ oo every mutation leads to a new genotype, whose fitness 
can be randomly generated without need to keep track of the neighborhood relations 
of the sequence space. In this way large populations can be simulated efficiently for 
many generations. Moreover, we show that for long times and small mutation rates the 
model reduces to a simple point process which is partly tractable analytically. We also 
present an analytic solution of the infinite population limit of the model, which is useful 
for describing the evolution of finite populations at early times and provides important 
insights into the structure of the fitness distribution. 

The infinite population version of our model was introduced by Kingman in 1978 
[28] and is known as the house of cards model [29j. This term refers to the idea 
that the genetic organization of an organism is very fragile, such that it is completely 
disrupted by any change in the genome which therefore leads to a new fitness that 
is uncorrelated with the fitness of the parent genotype. Previous studies of the finite 
population dynamics of this model have been concerned mostly with the regime of weak 
selection, where a balance is established between deleterious and beneficial mutations 
[30t [3T| [32] . In contrast, in the present paper we consider the strong selection regime 
where the population fitness increases without bound (see section 14.11 for a definition of 
these regimes). 

We give a brief outline of the paper. In the next section we introduce the basic 
Wright-Fisher dynamics for asexually reproducing populations of constant size A^, 
and describe how to implement the infinite sites limit. The evolution of the fitness 
distribution in the deterministic limit A^ — > cxo is discussed in section |3l In section H] 
we turn to the long-time behavior of finite populations. The key observation is that 
(for unbounded fitness distributions) fixation events in which a favorable mutation 
spreads in the population become rare and well-separated as the mean fitness increases. 
The evolution then reduces to a simple point process closely related to the dynamics 
of records P [HI [331 El]- We show that to leading order the mean population 
fitness achieved up to time t is given by the mutation of largest fitness that has 
been encountered, and we develop an analytic framework to compute sub-leading 
contributions to the mean fitness and the fitness variance. Finally, conclusions and 
some open problems are presented in section [51 
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Figure 1. Schematic of tlie WF model explained in the text. The population size 
is fixed at A^ = 5. Time direction is indicated by the arrow (5 generations). Circles 
signify the individuals and different colors mean different genotypes which arise by 
mutation with probability U. The vertical location of individuals has no significance 
(the population is assumed to be well-mixed, without spatial structure). Initially, all 
individuals have the same genotype (hence the same color). Line segments connecting 
individuals show who begets whom. Each individual can have only one parent but a 
parent can have many offspring. After five generations, the red mutation which arose 
in a single individual at time 2 is fixed in the population (see section 14.11 for further 
discussion of the fixation process). 



2. Wright-Fisher dynamics in the infinite sites Hmit 

We consider a population of A^ individuals reproducing asexually, in discrete, non- 
overlapping generations. The basic Wright-Fisher (WF) dynamics [351 ES] of evolution 
can be described as follows. Each individual i is assigned fitness Wi^t {i = ^, ■ ■ ■ ,N) 
at generation t. Initially, all individuals have the same genotype and accordingly the 
same fitness Wi^ = 1. At every generation, all individuals are replaced. The probability 
that a new individual is an offspring of the parent i is proportional to parent's fitness 
Wi^f To be more accurate, this probability is Wi^t/{u)tN), where Wt = J2i=i '^i,t/N is the 



mean fitnes^JI at generation t. In the actual simulation, we do not discern different 
progenitors if they have the same genotype. Instead, the number of progeny of a 
given genotype is determined from the multinomial distribution with a probability also 
proportional to the population of that genotype. The multinomial distributed numbers 
are chosen by sampling correlated binomial random numbers; see, e.g., [371 [38] . Since 
this reproduction scheme is invariant under the multiplication of a constant to the fitness 
of all individuals, setting the average initial fitness to unity implies no loss of generality. 
Once all individuals are replaced, a mutation can change the genotype of each one with 

I In this paper, we use three different notations for 'mean fitness'. First, wt is a random variable defined 
as the population average of the fitness in a specific reafization of the WF model. Second, we use {wt) 
to denote the average of Wt over independent realizations; for infinite populations, this distinction is 
unnecessary. Third, by iD(t) we mean {wt) aXt — t/{NU) for given N and U to emphasize the similarity 
between the simulation studies and the continuous time point process introduced in section ID 
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probability U. A cartoon illustrating the WF model is depicted in figure [H 

In the infinite sites limit, every genotype occurring by mutations can appear only 
once (there are no recurrent mutations [39]). In the REM landscape the fitness of the 
mutant is a random number w drawn from some fixed distribution g{w), independent 
of the number of sites affected by the mutation as well as of the parental genotype. 
By contrast, the multiplicative fitness landscape is represented in the infinite sites limit 
by drawing a random selection coefficient s and generating the new fitness w' from the 
parental fitness w by multiplication, w' = {1 + s)w [371 HHl SH H2] . 

The model is completely specified by the population size A^, the mutation 
probability U and the choice of the mutation distribution g{w). To make contact with 
the evolutionary dynamics in a finite space of sequences of length L |27j, we note that 
U is the probability that at least one mutation occurs in a genotype. We therefore have 
the relation 

t/=l-(l-/i)^~l-exp(-/iL), (1) 

where fi is the mutation probability per site, and the limit fi —>■ 0, L —>■ oo is implied. 
Typical values for U range from 0.0025 for E. coli to 0.15 for humans and 0.9985 for 
the bacteriophage QP [T5] . 

An important difference between the models with finite and infinite genome size 
is that in the latter case there are no local fitness maxima in which populations can 
get trapped when U is small [3, El [IS [27|. Hence we expect an indefinite increase of 
the population mean fitness Wt when the fitness distribution g is unbounded. In the 
following sections we explore how the behavior of Wt depends on the model parameters 
and the choice of g. Some representative results from simulations with different values 
of A^ and U and the exponential mutation distribution 

g{w) = e-^ (2) 

are shown in figure [2l The majority of our results were obtained with the choice ([2]). 

3. Infinite populations 

In this section, we consider the infinite population dynamics in the REM fitness 
landscape with an infinite number of sites. In this paper, we take the infinite site 
L ^y oo limit first, followed by the infinite population limit N —>■ oo; note that these 
two limiting procedures do not obviously commute. 

3.1. Calculation of the mean fitness 

Let ft{w)dw be the fraction of individuals in the population with a fitness between 
w and w + dw. In the limit of infinite population size this evolves deterministically 
according to [28] 

/,+,H = (i-[/)^^ + t/^H, (3) 
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Figure 2. Mean fitness of the WF model for finite and infinite populations. For finite 
populations, the mean fitness is also averaged over independent runs. The mutation 
probability U is set to 0.01 and the mutation distribution is g{w) — e^"". From bottom 
to top, the population size N increases {N = 10^, 10^, 10^, and 10^). The number of 
independent runs is 8 x lO^ {N = 10^), 8 x 10^ {N = 10^), 20 000 [N = 10^), and 
4000 {N = 10^). Left panel shows the short time simulation results and compares them 
to the infinite population calculation in section [31 Right panel depicts log-linear plots 
of the mean fitness for finite populations which clearly shows a logarithmic increase of 
the mean fitness at long times. 



where Wt = J^ dw wftiw) is the mean fitness. If [/ = 1, trivially and intuitively ftiw) = 
g{w). The nonlinearity can be removed by introducing [151 128] htiw) = ft{w) Y[k=o'^k 
with ho{w) = foiw) which satisfies 

h,+,{w) = (1 - U)wht{w) + Ug{w)Xt+,, (4) 

where 



X.,, = n 



Wk 



fc=0 



dw ht^iiw). 



(5) 



The last equality is due to the fact that ft{w) remains normalized. Formally, we can 
solve the above equation such that 



ht{w) t , TT ( ^^^ w* ^Xk 

From the self-consistency condition ([5]), one can find a recursion relation for Xt 

t 

Yt = (^t + uY,ykGt-k, 

k=l 



where Yt = Xt/{1 — Uy, and the quantities 



OO POO 

dw fo{w)w^, Gt = dw g{w)w^ 
Jo 



(6) 



(7) 



(8) 
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are moments of the initial condition fo{w) and the fitness distribution g{w), respectively. 
The recursion relation for Yt can be rewritten in the form 

fc=i 

which is (at least numerically) solvable by iteration. Once we have calculated Yt, the 
mean fitness follows naturally from 

Wt = {l-U)^. (10) 

Although we cannot find a complete analytic expression for Wt, the leading 
asymptotic behavior can be easily found for some cases. If, for large t, 

Gt-i > 0t and Gt > GkGt-k for all 1 < A; < t - 1, (11) 

one can say with small error that 

''- ^ :Y,Gt-i => Wt^{l-U)^. (12) 



. l_f/ ^ ^-^ ^ ^ 'Gt_^' 

The criterion flTTj) is actually not very restrictive. If g{w) has the form of a stretched 
exponential multiplied by a power law, 



'<'^'>^;;FfeiTUJ^-^''l-UJ !■ <^'* 



where wq is a constant and r(x) is the gamma function, the t-th moment oi g{w) becomes 

which satisfies ( ITTT) . Using Stirling's formula T{z) ~ z^e~^ \/2i\ j z, one finds that 
Wt /z/ + t+lV^'^ (^'^^ 



w^{\-U) - \ (5 ) \PJ ' ^^^^ 

a result that was also reported by Kingman [28] . 

When /? — s> oo, one might expect that the power law (TTSll turns into a logarithmic 
increase of the fitness. As an example of an unbounded distribution that decays more 
rapidly than flT3l) . we consider in Appendix A the Gumbel-type mutation distribution 

g(w) = — exp (— - e"/"'«^ . (16) 

Wo \wo J 

Figure [3] compares the exact behavior obtained by iterating (Q to the asymptotic 
expression (IT5|) and ( l83l) for the exponential, Gaussian, and Gumbel-type distributions. 
In this comparison, the initial fitness distribution is set to fo{w) = S{w — 1) and Wq = 1. 
The analytic approximations are seen to be very accurate in all cases, with error less 
than 1 % after 100 generations. 

Up to now, we have assumed that Gt is finite for any t. If g{w) has a power law 
tail 

giw) = a{l + wy^'^+'l (17) 
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Figure 3. Log-log plots of mean fitness in the infinite population limit as a function of 
generation t for exponential [i^ = and /3 = 1 in (|13p : upper three data sets], Gaussian 
[u = and (3 — 2 in ([13]); middle three data sets] and Gumbel-type distributions 
(fT6)) (lower three data sets). For all data sets, wo and /o(w) are set to 1 and (5(w — 1), 
respectively. For each case, the mutation probability isU — 0.5 (red), U = 0.1 (purple), 
and U — 0.01 (blue) from bottom to top. The lines are the approximate solutions (fT5|) 
and (jSSp with the respective parameters. 



however, Gt becomes infinite at t = to = {ct} where {x} means the smallest integer 
not smaller than x. This means that l^g is finite but Vig+i is infinite, i.e. Wt becomes 
infinite at t = to- This peculiarity of the deterministic selection dynamics has also been 
observed in the multiplicative case [37], and it is consistent with the behavior of the 
infinite population dynamics in a finite sequence space, where the population reaches 
the global fitness maximum in a single time step for large L j23j . 

For mutation distributions g{w) with bounded support the recursion ([3]) approaches 
a limiting distribution fooiw) which has been described in detail by Kingman |28j . 
Remarkably, for certain choices of g{w) one finds a condensation phenomenon in which 
foo{w) develops a ^-function singularity at the maximally possible fitness Wmax (set by 
the upper limit of the support of g or of /o, whichever is larger). The asymptotic mean 
fitness is bounded from below by (1 — f/)wmax- In the remainder of the paper we will 
restrict the discussion to unbounded mutation distributions. 



3.2. Fitness distribution 

Note that the mutation rate (provided it is nonzero) enters flTSj) only through the 
prefactor 1 — U, which is known in population genetics as the mutational load |29j . 
Its origin can be traced back to the fact that, in each generation, the fitness is reset 
randomly for a fraction U of the population, so that selection can act only on the 
remaining fraction 1 — U [compare to ([3])]. 

To clarify this effect in a quantitative manner, let us find out what the frequency 
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distribution looks like in the asymptotic regime. To this end, we calculate higher 
moments of ft{w) from ([3]) along with flT2l) . For convenience, let 

A recursion relation for the moments can be found by multiplying w"'~^ to both sides 
of ([3]) and integrating over w, which yields 

Ut) = (lit) (Cn-l(t + 1) - Y^^) , (19) 



and Ci(^) can be read off from flT2l) or flT5|) . The formal solution for Cn(^) is 

n— 1 j^j. n—1 ^ n—k—1 

Ut) = n Ci(t + k)-^-—J2jl[C,{t + i). (20) 

fc=0 fc=l ^=0 

Since woCi(^) ~ Gt/Gt-i for large t, Cn{t) becomes 

(1 - t/XCn(t) ^ t/G^ + (1 - t/)^±^ -^E ^'^-""''"' . (21) 

fe=i 
Constructing the Laplace transform (or the moment generating function) of the 
frequency distribution using (1211) such that 



e-''^ft{w)dw ^ Ug{z) + (1 - U)^t{z) - U^^), (22) 

where 

^(^) = E ^-^^" = / dwe-'^^aiw) (23) 

n=0 ■ "^ 

is the Laplace transform of g{w) and 

^'^'^ =^ nl G,_, ' ^^(^) = 2:^^E G^_^ - (24) 

n=0 ^ ^ n=l A;=l ^ ^ 

we can in turn find the frequency distribution through inverse Laplace transformation. 
Since the Laplace transformation is linear, ft{w) can be written as 

Mw) ^ Ug{w) + (1 - U)Tt{w) - U^t{w), (25) 

where Tt{w) and ^t{w) are the inverse Laplace transformations of (pt{z) and ipt{z), 
respectively. Since (ri > 1) 

^" < 2^ r ^ ^^ (26) 

fc=l ^*-l ^*-l 

because of the criterion flTTl) . we expect that when u; is small [large], the dominant 
contribution for /t(w) comes from g[vS) \Tt[vS)\. Hence we neglect the contribution from 
\E'i(ti'), which gives 

h{w)^Ug{w)^{\-U)Tlw). (27) 
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For the exponential distribution (z/ = 0, (3 = 1), the analytic form of <f{z) can be 
found. For this case, Gt = wlt\, g{z) = (1 + zwo)~^, and 

n=0 ■ * ^ n=0 ^ ^ 

Hence the frequency distribution is 



U _^ 1-U f w\ 



i-l 



ft{w) ^ —e -0 + — - — — — e -0. (29) 

Wo wo{t- 1)! \woJ 



One can easily check that the mean and the variance at large t become | 

wt^{l- U)wot, 5w'^ ^ f/(l - U)wlt'^, (30) 

where 6w^ = J w'^ ft{w)dw — w^ . Standard deviation and mean are of the same order and 
5w is even larger than wt ii U > ^. The origin of such a large spread is the division of 
(!29|) into two widely separated distributions: A time-independent part, arising from the 
mutations, and a traveling wave reflecting the selection dynamics. Then the mean and 
the variance oiTt{w) in the asymptotic regime are found to be two and tw^, respectively. 
Hence the spread of Tj is much smaller than the mean in the asymptotic regimes, which 
cannot be appreciated from the full variance in ( l30l ). By the central limit theoreml||J, 
Tt{w) is approximated by the Gaussian distribution 

TM - / exp ( J^-^f ) . (31) 

Although we cannot find an analytic form of ft{w) for the general class of 
distributions (fT3|l . the qualitative form is expected to be the same as in the exponential 
case, that is, a superposition of a stationary distribution due to mutations and a 
Gaussian travelling wave. Based on this conjecture, we can approximate the travelling 
wave for the general case. From the generating function of Tt{w) which is (p{z), one can 
get the leading behavior of the mean and variance of Tt{w) such as 

-m-^^^{§)\ ^ ^ (32) 

»5"(0)-^W = ^-f#^V«4f4'l'^. (33) 



G,-i VG.-iV ,<?2 \0 

that is, the travelling wave is Gaussian with mean Wt/{1 — U) and width ~ t*^^^^)/*^^'^\ 
Since we know the exact values of Wt from the numerical iteration of ([9]) , we can also 
numerically calculate the frequency density from ([3]) . Figure H] numerically confirms for 
the exponential and Gaussian mutation distribution that the frequency distribution is 
divided by two parts and the travelling part takes the Gaussian form with the predicted 
mean and variance in this section. The decomposition fl27j) will play a considerable role 
in understanding the behavior of finite populations with large U , see section 14.61 

§ In [28 , Sw^ was also calculated but the factor (1 — U) is missing in the result. 

II Recall that the Gamma distribution with integer t can be interpreted as the distribution of the sum 

of t independent and identically distributed random variables with exponential distribution [43j . 
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Figure 4. Comparison of the exact numerical solution for the frequency distribution 
with the travelling wave equation. Left Panel: Frequency distributions at generation 
100, 200, and 400 (from left to right) are shown for U = 0.5 (square) and U = lO"'^ 
(circle). There is a slight mismatch due to the neglect of ^t(w) but ingeneral the 
travelling wave solution approximate the true distribution quite well. Right panel : 
Similar study to the Left panel with the Gaussian distribution. The data are collected 
at generation 500, 1000, and 2000. The Gaussian approximation is almost perfect. 
The frequency distribution for small w is Ug{w) for both panels as reasoned in the 
text (data not shown). 



4. Finite populations 

4-1. Fixation and clonal interference 

An important element in the evolution of finite populations is the process of fixation, in 
which a mutation that is initially present in a single individual spreads in the population 
and eventually is shared by all individuals (see figure [1] for illustration). Consider 
the simple case of a single mutant of fitness w' entering a genetically homogeneous 
population in which all individuals have the same fitness w. The success of the mutant 
is determined by the selection coefficient 



w 



(34) 



which is positive (negative) for beneficial (deleterious) mutations. For the WF model 
the fixation probability is given approximately by 



t^n{s) 



1 



-2s 



1 



-2Ns' 



(35) 



When selection is strong, in the sense that 

A^lsl > 1, (36) 

it can be seen from (|35|) that fixation of deleterious mutations becomes exponentially 
unlikely, while a beneficial mutation is fixed with probability 

Tx{s) ^ 1 - e'^\ (37) 
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Previous work on the house of cards model in finite populations has been concerned with 
the weak selection regime, where N\s\ ~ 1 [301 EU |32] . This regime can be realized by 
choosing a mutation distribution glw) whose standard deviation is much smaller than 
the mean. However, in the context of the present paper the strong selection criterion 
(l36ll is always satisfied. 

The mean time to fixation of a beneficial mutation is given by 



tfix ~ • 38 

s 

Different evolutionary regimes arise from the comparison of igx to the expected time 
interval between the emergence of beneficial mutations that are destined for fixation. 
Denoting by Ub the beneficial mutation probability per individual, beneficial mutations 
arise at rate NUb- A mutation fixes with probability n{sb) ~ 2sb, where Sb is the typical 
selection coefficient which is assumed to be small, so that (l37jl can be approximated 
by 2s. Then the waiting time between fixation events is tmut ~ l/(2A^f/feSfe). When 
tfix < tmut or [42J 

2N\nNUb^l, (39) 

beneficial mutations arise rarely and fix independently, a regime that is referred to 
as periodic selection |15]. In the opposite case 2N In NUb ^ 1 clones originating 
from different mutants compete for fixation, a phenomenon that is known as clonal 
interference [IQ]. 

In previous studies of clonal interference [371 HOI El SSI HS] it has usually been 
assumed that Ub is a constant parameter, in which case (l39l) is a condition on the 
population size N which is violated when A^ becomes large. However, in a rugged 
landscape the supply of beneficial mutations decreases as the mean fitness grows. If the 
fitness distribution of the population is well clustered around its mean w, the probability 
of beneficial mutations can be estimated by 

POO 

Ub{w) = U Prob[u; > w] = U dw g{w), (40) 

J U! 

which vanishes for w ^ oo for any unbounded distribution g{w). Thus clonal 
interference is a transient phenomenon in rugged fitness landscapes. Asymptotically 
almost all mutations are deleterious, and hence U can be identified with the probability 
of deleterious mutations. 

4-2. Instantaneous fixation and the diluted record process 

The fact that the criterion (!39|) is asymptotically satisfied for unbounded g{w) implies 
that the fixation of beneficial mutations occurs as independent events that can then 
as well be treated as instantaneous (tgx —>' 0). Deleterious mutations do not fix, but 
they lower the mean population fitness through the mutational load 1 — f/ (compare 
to section [3]). For the time being, we will neglect this effect, which amounts to taking 
f/ — > 0. We will show in section 14.61 how the influence of deleterious mutations can be 
approximately reinstated. 
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When the effects of deleterious mutations are ignored, the population fitness 
between fixation events is equal to the fitness of the last beneficial mutation that was 
successfully fixed, and the population is genetically homogeneous at all times (except 
for the instances of fixation). The model reduces to a simple point process that can be 
informally described as follows [31], |32] : 

(i) Mutations with fitness values drawn randomly and independently from g{w) are 
generated in discrete time according to a Poisson distribution with mean NU. 

(ii) The fitness of the mutant is compared to the current population fitness; deleterious 
mutations are discarded while beneficial mutations are fixed with probability 7r(s) 
given by (J37l) . 

(iii) The fitness of the successfully fixed mutant replaces the current population fitness, 
which therefore evolves according to a piecewise constant, strictly increasing jump 
process. 

If every beneficial mutation were fixed, such that 7r(s) = 6(s), the process described 
above would be identical to a variant of the well-known problem of record statistics for 
sequences of independent, identically distributed variables [33l|3l|, in which a Poisson- 
distributed number of new variables is created in each discrete time step. When 7r(s) < 1 
some of the record events are lost in a way that is correlated to the corresponding 
record values. The process defined by the rules (i)-(iii) is referred to in the following as 
the diluted record process (DRP), and it will be studied extensively in the next three 
subsections. 

For long times, when beneficial mutations become increasingly rare, the discrete 
unit of time is unimportant and the dependence on the system parameters N and U 
can be eliminated by using the dimensionless time variable r = UNt. Asymptotically 
the DRP is therefore fully specified by the functions 7r(s) and g{w). A comparison 
between the DRP and the full WF dynamics is shown in figure [51 For large N an initial 
regime can be identified in which clonal interference reduces the fitness in the WF model 
compared to the DRP, and for large U the fitness is reduced by the mutational load 
effect, but for long times and [/ <^ 1 the agreement is seen to be essentially perfect. In 
this and the following three subsections, we restrict ourselves to f/ ^ 1 and the analysis 
of the behavior for large U is deferred to section 14. 6[ 

Clearly the record process (RP) provides an upper bound on the DRP at all times 
[32] ; for completeness, a derivation of the fitness distribution for the RP is provided in 
Appendix B. We will now argue that the RP bound is in fact saturated for the leading 
order behavior of the mean fitness, in the sense that 

hm Mi^ = 1. (41) 

To see why this is so, denote by Wi the largest fitness value that has appeared up to 
time r, and by W2 the largest fitness that has also been fixed. Then w{t) = wi in the 
record process and w{t) = W2 in the DRP. If wi were asymptotically larger than W2 
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Figure 5. Left panel: Semilogarithmic plots of the fitness vs dimensionless time 
r = NUt for U = 10"^ {N = 10^, lO'^, 10^ 10*^), U = lO^^ and U = 0.5 
{N — 10'^, 10'*, 10^) obtained from simulations of the WF model. The mutation 
distribution and the initial frequency distribution are same as those in figure [2] Right 
panel: The comparison of the simulation of the WF model to the DRP for N — 10'^ 
and U = 10^^ (r = lO^^t). The difference in the fitness is barely observable. Inset: 
Same type of the comparison for the variance K2 [see (|6T|) ] of the fitness. 



in the sense that Wi/w2 > C > 1, then the selection coefficient of Wi in a background 
of W2 is Si2 = Wi/w2 — 1>C — 1>0 and the corresponding fixation probabihty 
7r(si2) is bounded away from zero. It follows that Wi is fixed with finite probability, in 
contradiction to our assumption that W2 is the largest fitness that has been fixed. 

We will see later that the average ffiness W2 at time r for the exponential 
distribution is of order W2 ^ Inr — Inlnr, while ifi ~ Inr + const. As the difference 
between the largest and kth largest value is of order In k for exponential random variables 
[l6] , this implies that the rank of W2 among the r fitness values that have been created 
up to time r is O(lnr). 

In the following subsections we will develop some analytic tools to systematically 
compute the mean fitness and higher fitness moments for the DRP. 



4-3. Mean field approximation 

In the mean field approximation (MFA) the fitness distribution of the population is 
characterized only by its mean, which will be denoted by m(r) in the following. The 
probability that a new mutation with arbitrary fitness w' > m is fixed is given by 



Pfix 



IT 



W' 



m 



m 



g{w')dw' = m 7r{x)g{mx + m)dx, 



and the waiting time until this happens (in dimensionless units) is At = 1/pfix- 
fixation occurs, the population fitness increases by the amount 

J^ w'tt ( "''"' ) g{w')dw' Jg°° X7r(x)g{mx + m)dx 



Aw 



n ^ i"^) 9{w')dw' 



m 



m- 



/g°° Tr{x)g{mx + m)dx 



(42) 
Once 

(43) 
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Figure 6. Comparison of the WF simulation with the record, mean field, and the 
improved approximation scheme in section 14.41 with the stable value of Cdrp = 8. 
The WF simulation results are obtained with the exponential mutation distribution 
^ with N = 10^ and U = 10^^ (r = t). As explained in the text, mean field theory 
and record dynamics give lower and upper bounds on the true asymptotics. 



PfixAw. 



and m(r) is obtained by solving the differential equation 

dm Aw 

Ht ~ Ar 
For the exponential distribution ([2]) this takes the explicit form 

dm ^ ^ m + 1 ^_^ 

dr (m + 2)2 

with the solution 

e'"(m + 2) + /(m) = 4(r - tq) + e"'»(wo + 2) + I{wo), 

where 



Km,) = 



X + 



—dx ~ y ^n\ 



n=0 



[m + l)"+i 



(44) 



(45) 



(46) 



(47) 



The asymptotic expansion which is a divergent series is obtained by integration by parts. 
Now assume that r is extremely large, then the approximate inversion formula of (j46l) 
will be the solution of the equation 



me 



At 



m + Inm ^ ln(4r). (48) 

From fj48l) . one can easily see that lim^r^oo In t/tti = 1. Let m{T) = (1 + /(r))ln(4r) 
where /(r) — i> as r — i> oo. Then 

In In 4r 



/(r) ln(4r) + lnln(4r) + ln(l + /(r)) = 



f{r) 



ln4r' 



(49) 
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which, as expected, approaches zero in the asymptotic regime. According to extreme 
value statistics, the largest fitness up to time r is of order Inr + 7, where 7(~ 0.5772) 
is the Euler number. But the leading behavior of m(r) is Inr + In 4, which seems 
contradictory. This apparent paradox can be resolved by looking at the difference 
ln(r) — m(r) = O(lnlnt) > 0, which diverges for r —>■ 00. In figure [6], the MFA 
solution fj48|) with the correction pOl) is compared to the WF simulation and the record 
problem. 

For fitness distributions with a power law tail ( TT7I) . we get (a > 1) 

-ry T/ . \ A J(a,m) — J(a + l,m) ^ , ^, 

Pfix = «"^ J[a + l,m), /\w = m — 1, (5(J) 

J{a + l,m) 

where 

J(a, m) = r [l- e-2("-^)) (- + x") dx. (51) 

To have a meaningful result, we should restrict ourselves to the case a > 1. Hence fl44j) 

for the power law case becomes 

dTfi 

—— = arn'^'^^iJia.rn) — Jia + l,m)) — arrr°'J{a + l,m) , ^ 

dr V V > / V ' // V ' / j^g2) 

^ am~''+\K{a) - K{a + 1)), 

where K{a) = J{a, 00). Evaluation of the differential equation ( l52l) in the asymptotic 
regime (m ^ 1) yields 

m ^ ri/"(a2(ir(a) - K{a + 1))^/" = T^'^^L^a). (53) 

To complete the analysis, let us compare L{a) to the prefactor obtained in (l92l) for the 
record process. When a — 1 ^ 1, the two prefactors have the expansion 

L{a) = -^— + ln{a - 1) + 0.916 014 + o(l), 

'a-l\ 1 (^^) 

= r + (l-7)+o(l), 

a J a — 1 

which shows that the RP prefactor is larger than that of the MFA in this regime. When 
a ^ 1, the asymptotic behavior of the MFA prefactor can be obtained by integration 
by parts. One finds 

L{a) = 1 + - ln(4/a) + o{l/a) < 1, 

/ 1\" 7 ('') 

F 1-- =l + - + o(l/a) >1, 

\ a J a 

which also suggests that the MFA prediction is smaller than the RP value. Actually, 
the MFA prefactor becomes smaller than unity when a > 3.533 18 while T{{a — l)/«) 
remains larger than unity. In between, one can numerically check that the RP prefactor 
is always larger than that of the MFA. In fact, we will show in the next subsection 
that the MFA always provides a lower bound on the true mean fitness of the DRP. To 
summarize the mean field theory for the power law distribution, the MFA fitness is of 
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Figure 7. Log-log plots of u;(t) vs r for the simulation results of the WF dynamics 
with the power law distribution (fT7|l with a = | (upper data set), a = 2 (middle data 
set), and a — 4 (lower data set). The mutation probability is set to [/ = 10"'* and 
the population sizes are 10* and 10^ for a = | and 2, and 10*^ and 10^ for a = A, 
respectively. For comparison, the record solution (|92p (straight line segments above 
the simulation results) and the MF prediction ([55)1 (straight line segments below the 
simulation results) are also depicted. 



the same order as the extremal statistics estimate (!93|) . but the smaller prefactor is not 
consistent with the relation (14T]) . 

Figure [7] compares the WF simulation results with the record solution ( l92l) and the 
MF result fl53|) for three values of a. For a = |, the RP result is in good agreement 
with the WF simulations but as a increases, both bounds increasingly deviate from the 
simulation data. Since for a -^ oo the power law distribution approaches a distribution 
of exponential type, the results for the exponential distribution summarized in figure [6] 
indicate that a similar discrepancy should be expected for larger values of a. For a < 2, 
the variance of the distribution of record values is infinite, which implies that the value 
of a new record is usually much larger than the previous record and tt{s) ~ 1. In 
this case the DRP becomes identical to the RP. On the other hand, as a gets bigger, 
the ratio of two consecutive records approaches unity in the asymptotic regime. This 
implies stronger corrections to the leading behavior which, according to fj^Tl) . should 
still be given by the RP. 
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4.4- Master equation 

To improve on the mean field approximation, we will first derive an equation for the 
transition probability p{w,t\wo,0) of the DRP p^ [3T] . Since the model is a Markov 
process, the conditional probability p{w,t + dt\w',t') completely specifies the evolution 
equation. We have 

p(w, T + drlw', t) =6(w — w') ( I — dr n ( ) g(x)dx 

^ ^-\^ "'x ^ ' (56) 

, ^/ /N / \ / ""^ ~ ^ 

+ dT<c)[w — w jgywj-n 

\ w' 

Using the Markov property, the equation for p{w,t) = p{w,T\wo,ro) can be found as 

follows: 

p{w, T + dr) = dx p{w, T + (ir|x, t)p{x, t) 

= p{w, T)\l — dT TT I g[x)dx + g[w)dT I vr I p[x, T)dx, 



w I I ./^o V X 



and therefor- 

dp(w,T) f°° f X — w\ [^ fw — x^ 

-p{w,t) / TT I ] g{x)dx + g{w) / tt I \p(x,T)dx. (5t 



dr Jw \ w J Jwo \ ^ 

The description by the master equation (158!) is appropriate for a Markov process like 
the DRP which has non-continuous sample paths [17j. One can easily check that the 
record probability fl86l) solves fl58l) when n{x) = 0(x). 

From (l58l) evolution equations for arbitrary expectation values (/(w,r)) can be 
obtained in the form 

^(/(^> r)) = -^ dw p{w, T)f{w, t) = dw i f{w, t) — — ^ \ — ^ — p{w, t) 

\ / rlf( \^ ( '^ 

w I [f{wx + w,t) — f{w, t)] TT{x)g{wx + w)dx ) + ( jr^ — 

For example, the equation for the centered normalized moment k„ = (w — {w))"'/n\ is 

— ; — = (vo^ / XTi{x)g{wx + w)dx) , (60) 

dr \ Jo I 

—^ = y^( — — -/ dxx''n{x)g{wx + w)) - Kn-i—^—, (61) 

dr tr(\ rl [n-ry. Jq / dr 

where Sw = w — {w). For the cases of the exponential distribution ([2]) and the power 
law ( ITTI) . ( !601) becomes 

dW_/4(u, + l)^_\_ (62) 



dr \iw + 2)2 
d{w) 
dr 



a (u;-"+^( J(a, w) - J{a + 1, w)) - w-"J(a + l,w)), (63) 



% When the fixation of deleterious mutations is included by using the expression (j35p for the fixation 
probability, for suitable choices of ^(w) the master equation satisfies detailed balance with respect to a 
stationary distribution [31j . In the present setting where selection is assumed to be strong in the sense 
of (|36p . this stationary distribution cannot be reached on reasonable time scales. 
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which reduce to the mean field equations (H5I1 and (1321) if we approximate {xiw)) ~ 
x{{w)), where x(^) is the function inside the brackets on the right hand side of fl62ll63p . 
Again note that ( 163|) is meaningful only if a > 1. Since xi"^) is a convex function 
asymptotically, that is, x"('"^) > for w ^ 1, x((w)) < (xiw)), which means that the 
mean field theory yields a lower bound on the true asymptotics. 

4-5. Moment expansion 

We now develop a systematic approximation scheme which extends beyond mean field 
theory. First we write down the differential equations for m = (w), K2, . . . ,Ke which are 
available from Eqs. (I60t) and (!6T|) once g{w) is given. Then we expand the terms on the 
right hand side, of the general form {x{w)), up to ne in such a way that 

{x{m + 6w)) = x{m) + K2 + K3 +... (64) 

dr/i^ dm-^ 

and keep terms only up to ki in all equations for /€„ {i > n). If we only keep terms up 
to i = 1, then we arrive at the mean field equation. If we keep terms up to £ = 2, then 
we have 

dm _ 4(m + l) 4(m^ + Tm^ + 14m + 2) 

dr ~ (m + 2)2e™ ^ (2 + m)^e"' ^^' 

t/K2 _ 2(3m2 + 6m + 4) 2(m^ + Sm^ + SOm^ + 68m + 16) ^ '^ 

rfr ~ (m + 2)3e'" (m + 2)^6'^ '^^ 

for exponential g{w). Since we are interested in the asymptotic behavior, the above 
equations are approximated for large m by 

dr me™ dr me™ 

Thus we see that ^2 — ^ 3 and the mean field equation for m receives a multiplicative 
fiuctuation correction. By increasing i, the solution should become more accurate. 

Clearly the above scheme can not be applicable to all g{w). For example, if g{w) 
has a power law tail flT7|) . Kn is infinite if n > {a}. However, even if k„ is well defined 
for all n, it is not at all guaranteed that the solution for (w) becomes better as we take 
more and more k„ into account. To clarify this point, let us think about the record 
problem for the exponential distribution ([2]) which corresponds to tt{x) = G(x) and is 

exactly solvable. For this case fl60|) and fl6T|) yield 

, 00 

^ = (e-) = e-"^(e-^'") ~ e"" ^^(-l)'^^:,, (67) 



fc=0 




-f^n-i{e-n\ (68) 

where kq = 1 and ki = by definition and ~ means that the order of summation 
and integration is interchanged (without any legitimation). The approximation scheme 
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Figure 8. The results of the approximation scheme for the mean (Crp and Cdrp) 
and variance (^2) including terms up to ki for the RP (left panel) and for the DRP 
(right panel). Since we know the exact value of Crp and K2 for the record problem, in 
the left panel we compare the approximation to the exact values. 



described above implies that we keep terms only up to i in 067ll68p . Now assume that 
this solution becomes exact as £ — >• 00. If this is true, we expect that Kk — > Mk/k\ as 
T —>■ 00 where Mk is defined in (1881) . Naively interchanging the order of summation and 
integration, we then get 

fc=o ■ ""'' fc=0 

which along with fl67j) gives the exact asymptotic behavior m ~ Inr + 7. Similarly we 
obtain by commuting summation and integration that 



{e-'n - E( 



E°° (7+inx)'^ 



k\ 



'dx 



(69) 



where 



n—l 00 

EE 

r=0 fc=0 



Ir = ^ 



k + r 



n—l 



'-l)^Kr+k ~ e^ E /^ = Kn-ie"^ 



[—'-f — lna;)''xe ^dx 



r=0 



rif 



rirf 



(70) 



(71) 



for fc > 1 and Jq = 1. Thus it seems that our approximation scheme solves the problem 
accurately as C. increases. 

However, this agreement is in fact fortuitous. Equation (169!) illustrates the problem. 
One can easily see that {e~^^) is indeed equal to the integral yielding e'^, but the 
intermediate series Se = '^k=oi~^)'''^i' ^^ actually not convergent. In fact, even if we 
use the exact values for «;„, 5*^ oscillates between 1.500 34 and 2.0618 whose average is 
1.781 07 = e^. We conclude, therefore, that the approximation scheme can at best be 
expected to yield an asymptotic series. 

Despite this difficulty, we now demonstrate that the moment expansion yields 
reliable results when used with caution. We have applied the scheme to the record 
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problem and the DRP with exponential g{w). Assuming that all Hn will saturate as 
T ^ oo, the mean fitness then follows an equation of the form 

^ ^ CRp(£)e-"^, (72) 

for the record process 

dm Cdrp(^) 



dr me"^ 

for the DRP, respectively. The argument £ means that the constants Crp and Cdrp 
are evaluated keeping the /€„ up to n = i. For instance, we have already shown that 
Cj3Rp(l) = 4 from (US]) and Cdrp (2) = 16 from (m. 

Figure [8] summarizes the results of the approximate evaluation of the C's and of 
K,2 with increasing i. In the range 4 < £ < 10, the approximation yields rather stable 
values. The comparison with the exact results for the record problem shows that the 
method is excellent in this range. However, for i larger than 10, the error becomes 
uncontrollable. In fact, the solutions for £ > 10 in figure M are meaningless because 
some fi:2n's are negative which should be positive. The approximation suggests that 
Cdrp ~ 8, which gives a highly accurate estimate of the asymptotic fitness, see figure El 
On the other hand, the estimate k,2 ~ 0.8 appears to be somewhat smaller than the 
simulation results, see the inset of the right panel in figure [51 

4-6. Finite U 

Until now, the mutation probability U has been assumed to be very small. The natural 
extension of the previous study is to ask what will happen if the mutation rate is high, 
which is the topic of this subsection. 

One can get some insight from the infinite population calculation in section[3l where 
it was shown that the frequency distribution can be approximated by a superposition 
( 1271) of two distributions which are well-separated from each other. This is still expected 
to happen in the finite population case when the mean fitness is much larger than the 
average fitness due to mutations [the average oi g{w)]. Figure [H depicts the cumulative 
frequency distribution obtained from simulations of the WF model at t = 10^, 10^, and 
10^ for A^ = 10^ and U = 0.5, defined as 

F{w) = n ft{x;N)dxY (74) 

where (...) means an average over independent samples and ft{w; N) is the frequency 
distribution for population size A^. As expected, the cumulative distribution displays a 
plateau corresponding to the region of low probability between the two peaks, but in 
contrast to the ansatz (l27j) the height of the plateau is below U. To get a quantitative 
explanation of this effect, we assume that the frequency distribution is of the form 
ft{w; N) = {1 — ^)g{w) + ^S{w — m), where ^ is the weight of the high fitness peak 
which is approximated by a 5- function when m ^ 1. Then the population fraction of 
the genotype with fitness m increases after selection to ^m/^^m + 1 — ^), out of which 
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Figure 9. The cumulative frequency distribution for N — 10^ and U — 0.5 at 
t = 10^, 10^, and 10'' (from left to right). The data are collected from 64 000 
independent simulations. The approximate expressions e{w) in (j76p and ec{w) in (|77p 
with ^ = 0.4725 are also drawn for comparison. Inset: same but the abscissa is shifted 
by the amount TO(r) which is the solution of ([73]) with Cdrp = 8. The data sets for 
t = 10'^ and t = lO'* show a nice collapse. 



only a fraction 1 — U remains after mutation. If m and C, change slowly on the time 
scale of one generation, one obtains the stationarity condition 
[l-U)^m ^ ,^ _. U 



e 



e 



U) 



em + 1 - e 

which shows that ^ approaches 1 — U only for m 
distribution should take the form 



(75) 
m — 1 

-^ oo. This suggests that the cumulative 



F{w)^e{w) = {l-09M + ^, (76) 

for w < m. For the case considered in figure [HI the mean fitness at t = 10^ is ~ 19.15 
which gives ^ ~ 0.4725 in good agreement with the simulation data. Due to the 
logarithmic increase of m with t in the case of exponential g{w), the approach to the 
asymptotic value ^ ^ 1 — f/ is very slow. For example, to reach i^ = 0.49 requires to 
simulate t ~ 10^^ generations for our parameters. 

For a more accurate description of the high-fitness part of the frequency distribution 
we assume that, as in the infinite population case, the travelling wave contribution in 
the decomposition ( 1271) becomes Gaussian at long times. If this is true, the cumulative 
frequency distribution should take the form 



F{w) ~ ec{w) 



e 



erfc 



w — m(T) 



(77) 



2 V 2^^^ 

where erfc is the complementary error function and fi;2(~ 0.8) is obtained from the 
analysis of the DRP in the previous section. As shown in figure [HI ec{w) approximates 
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Figure 10. Plots of w{t; U)/{1 — U)vst for the data sets in the left panel of figureEl 
All curves now collapse in the asymptotic regime. 



the numerical data quite well, and the distributions obtained at different times 10^ and 
10"^ collapse (inset of figure [9]). Though ec{w) is not excellent, we would say that it is a 
reasonable approximation in the asymptotic regime. 

As a consequence of these considerations, the effect of the mutational load on the 
mean fitness is found to be remarkably simple: Asymptotically we have 

w{t; U) ^ ^m{T) + (1 - ^) = (1 - f/)m(r) (78) 

where m{T) is the mean fitness of the DRP. This prediction is confirmed in figure [TOl 

5. Conclusions 

In this paper we have explored several aspects of the evolution of an asexually 
reproducing population in a random fitness landscape which (in the sense of the NK 
family [8], [19]) is maximally epistatic. In contrast to previous work on the REM fitness 
landscape [201 HH [231 [261 [27] we adopt the limit of infinite genome length, which leads 
to a finite population version of Kingman's house of cards model [28]. Although the 
model can hardly be expected to provide a realistic description of empirical fitness 
landscapes [HI [TT], it serves as a useful counterpart to the much studied non-epistatic 
multiplicative landscape model and can help to develop some intuition for the generic 
features of adaptation under strong epistasis. 

An example for such a generic feature is the slowing down of the speed of adaptation 
compared to the multiplicative model, which reflects the fact that the supply of beneficial 
mutations dwindles with increasing fitness. This effect is well documented in evolution 
experiments with microbial populations [IHl [IHl [50]; in fact, it has been argued [9] 
that the data obtained by Lenski and Travisano [38] for populations of E. coli can be 
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quantitatively described as a logarithmic increase in fitness, which would be consistent 
with our finite population results for exponential g{w). 

We have extended Kingman's analysis of the infinite population limit to include 
the shape of the frequency distribution, which was found to become bimodal at long 
times. The finite population dynamics was shown to reduce to a diluted record process 
(DRP) for long times and small mutation probability f/ <^ 1. This representation 
allows to bound the mean population fitness from above by the standard record process 
and from below by a mean field approximation to the DRP, which can be improved 
systematically through a moment expansion. Although connections between record 
statistics and evolutionary processes have been suggested before [3 [HI El [lOl [HI [32], we 
have here for the first time established a precise quantitative relation of the theory of 
records to one of the cornerstones of population genetics, the WF model. Finally, using 
insights from the infinite population case, we have shown that the fitness distribution at 
large U can be understood as a superposition of the DRP distribution and the mutation 
distribution g{w). 

A number of important questions concerning this model are left to future work. For 
example, it would be interesting to consider the finite population dynamics for bounded 
mutation distributions g{w), for which the infinite population calculation predicts the 
appearance of singularities in the stationary fitness distribution [28]. Furthermore, 
the temporal statistics of the fixation events should be investigated with regard to 
its relation to record dynamics, for which detailed results are available p3|,[3l], as well 
as in comparison to recent work on the corresponding issue for multiplicative fitness 
landscapes 
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Appendix A: Infinite population calculation for Gumbel-type g{w) 
The first step is the calculation of the t-th moment of f[T6l) . 

Gt= w'g{w)dw = ewl / {\n{y)Ye~ydy. (79) 



To get an asymptotic expression, let us make a change of variables y = xt/ In t 
- In - — / rfx exp -- — <^ X - In t In 1 + 



Int V Vlnt/y /int ^ V Int 1 V In A- 
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Since t is assumed to be very large, the integral is dominated by the regime where the 
terms in the curly bracket attain a minimum which is easily seen to be unique. One can 
show that the contribution from the boundaries of the integral is exponentially small 
and the maximal contribution comes from the region around Xc which is the solution of 
the equation 

, , N Inlnt ^ /lnlnt\ , , 

(1-Xc)lnt = xc(lnxc-Inlnt) ^Xc^ l + -j + <^ ( "j ) • (^1) 

Expanding the terms in the curly bracket around Xc and performing the Gaussian 
integral, we get 



«— »yi('"(i^))'-p(-i^^-'"""0 + S)})' ''" 

Hence in the long run, the Gumbel-type distribution also meets the criterion (TTT!) and 
the mean fitness becomes 

M t flnlnt ( (§lnlnt-l)\\ ^ ^ 

^,«»„(l-C/)ln-exp(^j^(^l + A^^-^jj, (83) 

which increases logarithmically for long times. 

Appendix B: Record dynamics 

Here we derive an expression for the record probability distribution PRp(w,'r) of the 
population fitness at scaled time r, in the long time limit where the generation of new 
random variables is equivalent to a Poisson process in continuous time [51]. During time 
r, the probability that there are n mutations is e~'^r"/ra!. Hence the probability for the 
fitness at time r to be larger than x is 

Prob[w > X, r] = ^ e"^— (1 - (1 - Q{x)Y) = 1 - exp(-rQ(x)), (84) 



n=0 



where Q{x) = J dw g{w) is the cumulative mutation distribution and x is larger than 
the initial value Wq of the process. In addition there is a contribution from the possibility 
that no new record has appeared up to time r, 

Prob[i(; = Wq, t] = 1 — PToh[w > Wq, t] = exp{—rQ{wo)). (85) 

Together the two contributions yield 

p^p{w, t) = Tg{w)e-^^'^'"^Q{w - wq) + e-^^'-'"'>^6{w - wq), (86) 

from which moments and cumulants can be derived. For example, for the exponential 
case with g{w) = Q{w) = e~^ we obtain 

POO 

{w)kp - woe-^'~^° = dw WTe-'"e-^'~^ = Inr + 7 + 0(e""""™°), (87) 

J wo 
POO 

{{w - {w)KPr)RP = / (-7-lna;)"e-^da; + 0(e-^^"'"°)^M„,, (88) 
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where 7(~ 0.5772) is the Euler number. Hence 

(w)RP^lnr + 7, ((w - (u')rp)^)rp ^ Ma = y , (89) 

which is consistent with extreme value statistics [in|. For a Gaussian g{w) = 2e~^ /v^, 
one finds 

/■erfc(«)o) /■t'o 

(w")rp ^ r / {eric-\y))''e-^ydy ^ / (In(cr) - Inx)""^^ e-'^dx 

Jo Jo 



Tt^, ^ \\IL 1 '^ Ziii . . \\IL 9/ 9 '' 



(In(cr))^ + 2 (In(c^))" 7 + ^^(In(cr))^-^ ( ^ + - 



(90) 



where erfc is the complementary error function erfc(x) = -^ J^°° e "^ dy and erfc (y) is 

its inverse function whose leading asymptotic behavior is (Inc — Iny)^/^ with c = A/2/7r 
when y -C 1. Hence, for the Gaussian case, 



(i^)rp ^ lnV2(cr), {{w - {w)^^r)^^ ^ -f^ (91) 

24 In(cr) 

For a distribution with a power law tail (ITTl) . we have (for ra < a) 

(w")rp ^ /" dwraw^'il + w)-"-^e-"(^+"')"" ^ W^F ( ^^^^ j , (92) 

which yields (if both exist) 

{w)kp ~ y^iiw - (w)rp)2)rp ~ r^/". (93) 

If n > a in fl92|) . (w'^)rp is infinite. 
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